STRUCTURAL SUSCEPTIBILITIES IN TOY MODELS OF PROTEINS 
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New definitions of tiie structural susceptibilities based on the fluctuations of distances to the native 
state of toy protein models are proposed. The calculation of such susceptibilities does not require 
the basin of native state and the folding temperature can be defined from their peak if the na- 
tive conformation is compact. The number of peaks in the derivatives of distances to the native 
state with respect to temperature, when plotted versus temperature, may serve as a criterion for 
foldability. The thermodynamics quantities are obtained by Monte Carlo and molecular dynamics 
simulations. 

PACS Nos. 71.28.+d, 71.27.+a 



Understanding of many aspects of protein folding has 
been recently advanced through studies of toy lattice 
models |l|,y]. A more realistic modelling, however, re- 
quires considering off-lattice systems. In lattice models, 
the native state is usually non-degenerate and it coin- 
cides with the ground state of the systems. In the case of 
off-lattice models the native state has a zero measure and 
delineating boundaries of the native basin in off-lattice 
systems is vital for studies of almost all equilibrium and 
dynamical properties. For instance, stability of a protein 
is determined by estimating the equilibrium probability 
of staying in the native basin: the temperature at which 
this probability is ^ defines the folding temperature, Tf. 

In most studies, such as in Q-^, the size of a basin, 
5c, is declared by adopting a reasonable but ad hoc cut- 
off bound. In Ref. Q, for instance, the folding kinetics 
are studied by monitoring the number of native con- 
tacts. The definition of the native contacts remains, 
however, ambiguous because it depends on a choice of 
the cutoff distance. We have developed two systematic 
approaches to delineate the native basin . One of them 
is based on exploring the saddle points on selected tra- 
jectories emerging from the native state. In the second 
approach, the basin is determined by monitoring ran- 
dom distortions in the shape of the protein around the 
native state. It should be noted that the implementa- 
tion of these methods becomes difficult in the case of 
long chains. The question we ask in the present paper 
is what one can learn about the folding thermodynamics 
and the foldability of the off-lattice sequences without 
the knowledge of Sc- 

We start our discussion by introducing the following 
distances to the native state 
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Here dij = jr^ — r-,| are the monomer to monomer dis- 
tances in the given structure, A'^ is a number of beads. 
The subscripts d, ba and da refer to the distances, the 
bond angles and the dihedral angles, respectively. The 
superscript NAT corresponds to the native state. The 
bond angle, 9i, is defined as the angle between two suc- 
cessive vectors Vi and Vi+i, where Vi = r^+i — r^. The 
dihedral angle, 4>i, is the angle between two vector prod- 
ucts Vi-i X Vi and Vi x Vi+i- The angular distances to the 
native state have not been studied in previous works. 

We define the structural susceptibilities corresponding 
to the distances (1) as follows 
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where the angular brackets indicate a thermodynamic 
average. As one can see below, these three susceptibil- 
ities behave qualitatively in the same way. The sharp- 
ness of their peaks may be, however, different (see, for 
instance. Fig. 4) and it is useful to calculate all of them. 
In the case of an off-lattice model the departures of 
the sequence geometry from its native conformation are 
usually described through the structural overlap function 
ias 
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where 0{x) is the Heaviside function. The overlap struc- 
tural susceptibility, Xo, is then defined as the thermal 
fluctuation of Xs 



Xo =<5l> ~ <So>^ 
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The maximum in Xo, when plotted against T, may be 
interpreted as a signature of the folding temperature Tf 
[g|,^ . The advantage of the new definitions of the struc- 
tural susceptibilities (2) compared to Xo is that the na- 
tive basin 6c is not involved in their computation. 



We have also studied the foUowing derivatives of dis- 
tances with respect to T 
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where Rg is the gyration radius. Naively one can expect 
that the peaks of the derivatives D, when plotted against 
T, would coincide with those of the corresponding sus- 
ceptibilities X- It is, however, true only when the native 
conformations are compact. 

Using the Monte Carlo and the molecular dynamics 
simulations we have demonstrated that Tf locates at the 
peaks of Xd (Dd), Xba {Dba) or Xda {Dda) provided the 
native conformations are compact. Thus, the determi- 
nation of Tf does not require the native basin 5c- This is 
the main advantage of the new quantities given by Eqs. 
(2) and (5). 

The situation becomes more complicated when the na- 
tive conformations are not compact. In this case the na- 
tive basin is necessary for the accurate estimate of Tf. 
The information about the foldability may be, however, 
obtained without 5c monitoring the temperature depen- 
dence of Z?d, Dba and Dda- Namely, a good folder would 
have only one peak in Dd,Dba or Dda, when plotted 
against temperature, whereas a bad folder would have 
two peaks. This may serve as a criterion to distinguish 
the good folders from the bad ones. 

We focus on four sequences whose native conforma- 
tions are shown in Fig.l. The 27-monomer lattice chain, 
L27, is a Go sequence [|oi. Its Hamiltonian is as follows 
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where Ajj = 1 if monomers i and j are in contact and 
otherwise. The quantity aij = — 1 if monomers 



M] 



i and j are in contact in the native conformation and 
ail = otherwise. We use L27 to check the behavior 

5ha {Dha) for the 



^i] 



and 



of the new quantities 5ba {Dba) 
lattice models. 

The sequences denoted by G and R' are two- 
dimensional versions of the model introduced by lori, 
Marinari and Parisi pi[ . The Hamiltonian is given by 



H = J2{k{d 
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where i and j range from 1 to A'^=16. dij is measured 
in units of cr, the typical value of which is a = 5A. We 
take dp to be equal to 2^^^a and 1.16ct for G and R' , re- 
spectively [|l2[. The harmonic term in the Hamiltonian, 
with the spring constant k, couples the beads that are 
adjacent along the chain. The remaining terms represent 
the Lennard- Jones potential. Random values of Aij de- 
scribe the quenched disorder. In Eq. (7) e is the typical 



Lennard- Jones energy parameter. We adopt the units in 
which G—1 and consider k to be equal to 25e. Smaller 
values of k may violate the self-avoidance of the chain. 
The coupling constants Aij for system R' are listed in 
Ref. |1^. These are shifted Gaussian-distributed num- 
bers with the strongest attracting couplings assigned to 
the native contacts. For system G, Aij is taken to be 1 
or for the native and non-native contacts respectively. 
System R' has been shown to be structurally overcon- 
strained and hard to fold. 
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FIG. 1. Native conformations of four sequences studied in 
this work. 

The helical system H has a native state that mimics 
typical a-helix secondary structures. In this case the 
distances between beeds are assumed to have the length 
do = 3.8A. As one proceeds along the helix axis from one 
bead to another, the bead's azimuthal angle is rotated 
by 100° and the azimuthal length is displaced by 1.5 A. 
The Hamiltonian used to describe the helix is a Go-like 
modification of Eq. (7) and it reads |l3] 
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The first term is a backbone potential which includes the 
harmonic and anharmonic interactions 
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We take dp = 3.8A, ki = e and k2 = lOOe. The interac- 
tion between residues which form native contacts in the 
target conformation is chosen to be of the Lennard- Jones 
form 
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We choose cr^ so that each contact in the native struc- 
ture is stabiHzed at the minimum of the potential, i. e. 
aij = 2~^/^dfj, where dfj is the length of the corre- 
sponding native contact. Residues that do not form the 
native contacts interact via a repulsive soft core potential 
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Here cto = "2^^^^ dcut, dcut — 5.5A. The difference be- 
tween a Go and Go-like sequences is in the choice of the 
non-native contact interaction energy which is taken to 
be zero for the Go sequence and nonzero for the latter 
one. 
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FIG. 2. The temperature dependence oi C'v, x ^-nd D for 
sequence L27. The arrow corresponds to Tf. Tf is defined as 
a temperature at which the probability of being in the native 
state is 1/2. C'v and Dg are denoted by dotted lines whereas 
Xo and Do - by thick lines. The results are averaged over 50 
starting conformations. The error bars are smaller than the 
symbol sizes. 

The thermodynamics of L27 is studied by a Monte 
Carlo procedure that satisfies the detailed balance con- 
dition ||lj,|5[. The dynamics allows for single and two- 
monomer (crankshaft) moves. For each conformation of 
the chain one has A possible moves and the maximum 
value of A^ Amax, is equal to Amax = N + 2. In our 
27-monomer case Amax — 29. For a conformation with 
A possible moves, the probability to attempt any move 



is taken to be A/ A^ax and the probability not to do any 
attempt is 1- A/ Amax |15| . In addition, probability to do 
a single move is reduced by the factor of 0.2 and to do 
the double move - by 0.8 Q. The attempts are rejected 
or accepted as in the standard Metropolis method. The 
equilibration is checked by monitoring the stability of 
data against at least three-times longer runs. We have 
used typically 10^ Monte Carlo steps (the first 5 x 10^ 
steps are not taken into account when averaging). 
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FIG. 3. The same as in Fig. 2 but for sequence G. The 
native basin defined by the shape distortion approach [7] is 
equal to &c = 0.2a". The results are averaged over 100 molec- 
ular dynamics trajectories. 

Fig. 2 shows the temperature dependence of Cu, % 
and -D for L27, where x (-D) is a common notation for 
Xd {Dd), Xba (Dba), Xda {Dda) and Xo (Do). In this 
on-lattice case the overlap structural susceptibility Xo is 
also given by Eq. (4) but 60 reads as follows |g] 

60 ^ 1- ,., L. . . E ^(d,,-<^^). (13) 



N^ -3N + 2 
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For sequence L27 the peaks of all quantities are located 
at T = Tf . The fact that the maxima of Xo and Dg are 
located at the same position has been also observed for 
some on-lattice sequences |1^. Our new result is that, 
similar to Xo, the susceptibilities based on the fiuctu- 
ations of the distances to the native conformation and 
D also give a correct position for Tf. According to the 
thermodynamic criterion [| 16|, L27 should be a good 
folder because Tf coincides with the collapse tempera- 
ture Tg [Tg is defined as a temperature where Cu devel- 
ops a peak). 



In order to study the time evolution of the ofF-lattice 
sequences G,R' and H, we use the equations of motion 
by the Langevin uncorrelated noise terms: 
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Here ks and F are the Boltzmann constant and the 
kinetic coefficient, respectively. Equation (14) is in- 
tegrated by the fifth order predictor-corrector method 
| jl8[ . The integration step is chosen to be O.OOSr, where 
r — ma^ / e is the characteristic time unit and m is the 
mass of a bead. We take F equal to 2. In the following, 
the temperature will be measured in the reduced units 
of e/fc_B. 




FIG. 4. The same as in Fig. 2 but for sequence R'. The 
native basin is equal to 5^ = 0.09cr. The results are averaged 
over 170 molecular dynamics trajectories. 

The folding properties of G, R' and H were character- 
ized in detail previously [|lj,|3|. One of them, R\ is a 
bad folder and two others are good folders. We calculate 
the thermodynamic quantities of G, R' and H by aver- 
aging over many molecular dynamics trajectorjes using 
the native state as the starting configuration to make 
sure that the evolution takes place in the right part of 
the phase space [|l2[. For all of these sequences, the time 
used for averaging in each trajectory is 4000r for each 
temperature. The first 2000t are discarded. 

Fig. 3 and 4 show the results for G and R' . Since 
these sequences are two-dimensional, Xha and Dha cor- 
responding to the dihedral angles do not appear. The 



basin was obtained by the shape distortion approach 
and is equal to 5c = 0.2cr and 5c — 0.09cr for G and 
R\ respectively. Within the error bars of 0.02 all of the 
maxima of Xi D and Gy are located at the folding tem- 
perature Tf (Tf = 0.24 ± 0.02 and Tj = 0.10 ± 0.02 for 
G and R\ respectively). Therefore, the determination of 
Tf does not require the native basin because it is enough 
to find the peak of x (or of D) in which 5c is not involved. 

For R\ Xo has only one peak at Tf whereas Xd and 
Xba have an additional maximum at T = Tg. Therefore, 
the advantage of Xd and Xba compared to Xo is that they 
allow to find not only Tf but also Tg. Since the max- 
imum of Dg is broad around the folding temperature, 
it is better to locate Tf as a second peak of Xd {Dd) or 
Xba [Dba)- It demonstrates the another advantage of the 
new quantities compared to the standard quantity Dg. 

It should be noted that the behavior of Xd and Xba 
is qualitatively the same but there is a quantitative dif- 
ference in the sharpness of their peaks. It is clear from 
Fig. 4 that at T = Tf the maximum of Xba is more 
pronounced compared to that of Xd- An opposite sit- 
uation takes place at T — Tg. So, the study of all of 
susceptibilities would help us to isolate peaks better. 

The fact that xo has only one peak, but the others 
have two may be explained in the following way. Since 
Xo is a fluctuation of the overlap with the native state 
it reflects the behavior of the system in the vicinity of 
the native basin and it should have, therefore, only one 
peak at Tf. The remaining susceptibilities related to the 
chain compactness would have two maxima at Tf and Tg 
where the topology changes abruptly. 

The temperature dependence of Xi ^ and Gy for the 
three-dimensional sequence H is shown in Fig. 5. In 
this case we have the basin 5c = 0.12cr and Tf = 0.24 ± 
0.02 [p^p9|. Since Xd,Xba and Xda do not display any 
peak in the relevant temperature interval, they cannot 
be used to determine Tf. It is also true for D (their 
extremal points are located at T = 0.325 ± 0.025 which 
is far from Tf). The overlap susceptibility Xo has the 
maximum at T^^ — 0.275 ± 0.025. Within the error bars 
T^^ may be identified as Tf but such an estimate is less 
accurate compared to the case of G and R\ Furthermore 
its computation involves the native basin 5c. 

From the results presented in Figs. 2 - 5 we pro- 
pose the following criterion for foldability: a good folder 
would have only one peak in the derivatives of distances 
to the native state with respect to temperature whereas 
a bad folder has two. Our criterion is compatible with 
the fact that for the good folders the folding takes place 
just after the collapse transition. A three state scenario 
of folding is, however, more suitable for the bad fold- 
ers P|. Thus, one can still gain information about the 
foldability for H without the native basin 5c. 

The question we ask now is why H is so different from 
the other sequences. The main difference is that its na- 
tive conformation is not compact. It results in the non- 
trivial dependence of Rg on T: Dg does not develop a 
maximum but rather a minimum around the the collapse 
transition. This leads to the anomal behavior of Xd, Xba 



and Xha- 



X 

o 
o 



u 



0.3 



0.2 



0.1 



0.0 



H lOx,^^ 


^~/ 


6^ = 0. I2a f / 


/Xd 


^^^o//U\^---.., 


-...Cv 


- J // lox^ 




_ o D„ A 


'^ Dda 


° D^ / ^ 


^ D. 


o SD,,^ / j^f^^^ 


^"^-e-o 


^^^y^^^ 


'"'^^ 


tti>^ T** 


fc^ 


'. A-"' 




1 1 1 1 


1 







0.0 0.2 0.4 0.6 

T 

FIG. 5. The same as in Fig. 2 but for sequence H. The 
native basin is equal to &c = 0.12(j. The resuhs are averaged 
over 200 molecular dynamics trajectories. 

In conclusion, we have introduced several new struc- 
tural susceptibilities as fluctuations of distances to the 
native conformation. If the native confomation of pro- 
teins is compact Tf may be obtained from the peak of x 
and the native basin is not required. For sequences with 
non-compact native conformation be is not needed to es- 
tablish the foldability but the accurate estimate of Tf 
should involve it. The number of peaks in the deriva- 
tives of distances to the native state with respect to 
temperature, when plotted against T, may serve as a 
tool to distinguish between good and bad folders. The 
question of why the susceptibilities x ^^nd the derivatives 
D behave in the same way if the native conformations 
are compact remains to be elucidated. Nevertheless, in 
agreement with other studies (see, for instance, Ref. pfl] 
and references there), our results indicate that the topol- 
ogy of the native state plays a crucial role in the folding 
process. 
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